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Abstract. A hypergraph is a set family defined on vertex set V. The 
dual of T is the set of minimal subsets H ofV such that FdH 7^ for any 
F £ F. The computation of the dual is equivalent to many problems, 
such as minimal hitting set enumeration of a subset family, minimal 
set cover enumeration, and the enumeration of hypergraph transversals. 
Although many algorithms have been proposed for solving the problem, 
to the best of our knowledge, none of them can work on large-scale 
input with a large number of output minimal hitting sets. This paper 
focuses on developing time- and space-efficient algorithms for solving 
the problem. We propose two new algorithms with new search methods, 
new pruning methods, and fast techniques for the minimality check. The 
computational experiments show that our algorithms are quite fast even 
for large-scale input for which existing algorithms do not terminate in a 
practical time. 



1 Introduction 

A hypergraph is a subset family defined on a vertex set V ^ that is, each element 
(called hyperedge) _F of is a subset of V. The hypergraph is a generalization of 
a graph so that edges can have more than two vertices. A hitting set is a subset 
H oi V such that H D F $ ioi any hyperedge F ^ F. K hitting set is called 
minimal if it includes no other hitting set. The dual of a hypergraph is the set 
of all minimal hitting sets. The dualization of a hypergraph is to construct the 
dual of a given hypergraph. 

Dualization is a fundamental problem in computer science, especially in ma- 
chine learning, data mining, and optimization, etc. It is equivalent to (1) the 
minimal hitting set enumeration of given subset family, (2) minimal set cover 
enumeration of given set family, (3) enumeration of hypergraph transversal, (4) 
enumeration of minimal subsets that are not included in any of the given set 
family, etc. One of the research goals is to clarify the existence of a polynomial 
time algorithm for solving the problem. The size of dual can be exponential in 
the input hypergraph, thus the polynomial time algorithm for dualization usu- 
ally means an algorithm running in time polynomial to the input size and the 
output size. Although Kachian et al.[6] developed a quasi-polynomial time algo- 
rithm which runs in 0{N^°^^) time, where N is the input size plus output size, 
the existence of a polynomial time algorithm is still an open question. 



Prom the importance of dualization in its application lot of research 

has aimed at algorithms that terminate in a short time on real world data. The 
size of the dual can be exponential, but in practice, it is huge but not intractable. 
Thus, practically efficient algorithms aim to take a short time for each minimal 
hitting set. Reduction of the search space was studied as a way to cope with 
this problem [4,6-8, 12, 15]. Finding a minimal hitting set is easy; one removes 
vertices one by one unless each has an empty intersection with some hyperedges. 
However, finding exactly all minimal hitting sets is not easy; we have to check 
a great many vertex subsets that can be minimal hitting sets. The past studies 
have succeeded in reducing the search space, hut the computational cost was 
substantial, hence the current algorithms may take a long time when the size of 
the dual is large. 

In this paper, we focus on developing an efficient computation for the case 
of large-scale input data with a large number of minimal hitting sets. We looked 
at the disadvantages of the existing methods and devised new algorithms to 
eliminate them. 

— breadth-first search: A popular search method for dualization is hill climbing 
such that the algorithm starts from the emptyset, and recursively adds ver- 
tices one by one until it reaches minimal hitting sets. The minimal hitting 
sets already found are stored in memory and used to check the minimality. 
This minimality check is popular, but its memory usage is so inefficient so 
that we cannot solve a problem with many minimal hitting sets. We alleviate 
this disadvantage by using a depth-first search algorithm with the use of the 
new minimality check algorithm explained below. The algorithm proposed 
in [8,9] uses a depth- first search, but its minimality check takes a long time 
on large hypergraphs. 

— minimality check: The time for the minimality check in a breadth-first search 
is short when the hitting sets to be checked are small on average, but will 
be long for larger hitting sets (such as size 20 or larger). We alleviate this 
disadvantage by using a new algorithm that does not need the hitting sets 
that have already been found. We introduce a new concept, called the crit- 
ical hyperedge, that characterizes the minimality of hitting sets. Computing 
and updating critical hyperedges can be done in a short time, thus we can 
efficiently check the minimality in a short time. 

— pruning: Several algorithms use pruning methods to reduce the search space, 
hut our experiments show that these pruning methods are not sufficient. We 
propose a simple but efficient pruning method. We introduce a lexicographic 
depth-first search, and thereby remove vertices that can never be used and 
prune branches without necessary vertices. The pruning drastically reduces 
the computation time. 

— sophisticated use of simple data structures: Not many studies have mentioned 
the data structures or how to use them efficiently, despite this being a very 
important consideration to reduce the computation time. We use both the 
adjacency matrix (characteristic vectors of hyperedges) and doubly linked 
lists to speed up the operations of taking intersections and set differences. 



This accelerates the computation time in extremely sparse, extremely dense 
(use complement as input), non-small minimal hitting sets (over 10 vertices) 
cases. 

The paper is organized as follows. In the following subsections, we explain 
the related work and related problems. Section 2 is for preliminaries, and Section 

3 describes the existing algorithms. We describe our new algorithms in Section 

4 and show the results of computational experiments in Section 5. We conclude 
the paper in Section 6. 

1.1 Related Work 

There have been several studies on the dualization problem, of which we shall 
briefly review the DL, BMR, KS and HBC algorithms. These algorithms are 
classified into two types according to their structure; improved versions of the 
Berge algorithm[2], and hill-climbing algorithms. The Bcrge algorithm updates 
the set of minimal hitting sets iteratively, by adding hyperedges one by one to 
the current partial hypergraph. DL, BMR and KS are the algorithms of this 
type, and HBC is the hill-climbing type. The candidates for minimal hitting sets 
are generated by gathering vertices one-by-one until a minimality condition is 
violated. When a candidate becomes a hitting set, it is a minimal hitting set. 
The HBC algorithm does this operation in a breadth-first manner. 

The DL algorithm, proposed by Dong and Li [4], is a border-differential 
algorithm for data mining. The main difference from the Berge algorithm is that 
it avoids generating non-minimal hitting sets by increasing the problem size 
incrementally. The DL algorithm starts from an empty hypergraph and adds a 
hyperedge iteratively while updating the set of minimal hitting sets. The sizes 
of the intermediate sets of minimal hitting sets arc likely smaller than that of 
the original hypergraph, thus we can expect that there will be no combinatorial 
explosion. Experiments on two small UCI datasets [14] have shown that the DL 
algorithm is much faster than their previous algorithm and the level-wise hill 
climbing algorithm. 

In general, the Berge algorithm and DL algorithm are very useful when the 
hypergraph has few hyperedges, but for large hypcrgraphs, it may take a long 
time because of many updates. The BMR algorithm, proposed by Bailey et al. 
[5] , starts from a hypergraph with few vertices with hyperedges restricted to the 
vertex set (the vertices not in the current vertex set are removed from the hy- 
peredges). The hyperedges grow as the vertex set increases. The BMR algorithm 
first uses the Berge algorithm to solve the problem of the initial hypergraph, and 
then it updates the minimal hitting sets. Note that Hagen tested a version of 
the DL algorithm instead of the Berge algorithm [11]. 

Kavvadias and Stavropoulos's algorithm (KS algorithm) [8, 9] embodies two 
ideas; unifying the nodes contained in the same hyperedges and depth-first 
search. These ideas help to reduce the number of intermediate hitting sets and 
memory usage. To perform a depth-first search, they use a minimality check al- 
gorithm that does not need other hitting sets; check whether the removal of each 



vertex results a hitting set or not. The KS algorithm uses an efficient algorithm 
for this task. 

Hebert et al. proposed a level- wise algorithm (HBC algorithm) [7]. Their 
algorithm is a hill climbing algorithm which starts from the empty set and adds 
vertices one by one. It searches the vertex subsets satisfying a necessary condition 
to be a minimal hitting set, called a "Galois connection" . A vertex subset satisfies 
the Galois connection if the removal of any of its vertexes decreases the number 
of hyperedges intersecting with it. The sets satisfying the Galois connection form 
a set system satisfying the monotone property (independent set system), thus 
we can perform a breadth-first search in the usual way. 

1.2 Related Problems 

Dualization has many equivalent problems. We show some of them below. 

(1) minimal set cover enumeration 

For a subset family defined on a set E, a set cover 5 is a subset of such 
that the union of the members of S is equal to E, i.e., E = [j^^gX. A set 
cover is called minimal if it is included in no other set cover. We consider to 
be a vertex set, and J^{v) to be a hyperedge where J^{v) is the set of F G 
that include v. Then, for the hyperedge set (set family) F = {J^{v)\v € E}, a 
hitting set of is a set cover of T, and vice versa. Thus, enumerating minimal 
set covers is equivalent to dualization. 

(2) minim,a,l uncovered set enumeration 

For a subset family defined on a set E, an uncovered set S" is a subset of E 
such that S is not included in any member of J". Let J" be the complement of J", 
which is the set of the complement of members in J^, i.e., .F = {E \ X\X E J^}. 
S is not included in X G if and only if S and E \ X have a non-empty in- 
tersection. An uncovered set of is a hitting set of .F, and vice versa, thus 
the minimal uncovered set enumeration is equivalent to the minimal hitting set 
enumeration. 

(3) circuit enum,cration for independent system, 

A subset family defined on E is called an independent system if for each 
member X of J^, any of its subsets is also a member of J^. A subset of E is 
called independent if it is a member of T, and dependent otherwise. A circuit is 
a minimal dependent set, i.e., a dependent set which properly contains no other 
dependent set. When an independent system is given by the set of maximal in- 
dependent sets of T, then the enumeration of circuits of is equivalent to the 
enumeration of uncovered sets of 

(4) Computing negative border from positive border 

A function is called Boolean if it maps subsets in 2^ to {0, 1}. A Boolean function 
B is called monotone (resp., anti- monotone) if it for any set X with B{X) = 
(resp., B{X) = 1), any subset X' oi X satisfies B{X') = (resp., B{X') = 1). 



For a monotone function B, a subset X is called a positive border if B{X) = 
and no its proper superset Y satisfies B{Y) = , and is called a negative border 
if B{X) = 1 and no its proper subset Y satisfies B{Y) = 1. When we are given 
a Boolean function by the set of positive borders, the problem is to enumerate 
all of its negative borders. This problem is equivalent to dualization, since the 
problem is equivalent to uncovered set enumeration. 

(5) DNF to CNS transformation 
DNF is a formula whose clauses are composed of literals connected by "or" 
and whose clauses are connected by "and" . CNF is a formula whose clauses are 
composed of literals connected by "and" , and whose clauses are connected by 
"or". Any formula can be represented as a DNF formula and a CNF formula. 
Let D be a DNF formula composed of variables xi, x„ and clauses Ci, Cm- 
A DNF/CNF is called monotone if no clause contains a literal with "not" . Then, 
S is a hitting set of the clauses of D if and only if the assignment obtained by 
setting the literals in S to true gives a true assignment of D. Let iJ be a minimal 
CNF formula equivalent to D. H has to include any minimal hitting set of D as 
its clause, since any clause of H has to contain at least one literal of any clause 
of D. Thus, a minimal CNF cqiiivalcnt to D has to include all minimal hitting 
sets of D. For the same reason, computing the minimal DNF from a CNF is 
equivalent to dualization. 

2 Preliminaries 

A hypergraph J-" is a subset family {Fi, . . . , Fm} defined on a vertex set V, that 
is, each element (called hyperedge) F of ^ is a subset of V. The hypergraph is 
a generalization of a graph so that edges can contain more than two vertices. 
A subset -ff of F is called a vertex subset. A hitting set is a vertex subset H 
such that H Ci F ^ <l) for any hyperedge F G J^. A hitting set is called minimal 
if it includes no other hitting set. The dual of a hypergraph is the hypergraph 
whose hyperedge set is the set of all minimal hitting sets, and it is denoted 
by dual{F). For example, when V = {1,2, 3, 4}, 7" = {{1,2}, {1,3}, {2, 3, 4}}, 
{1,3,4} is a hitting set but not minimal, and {2,3} is a minimal hitting set. 
dual{T) is {{1,2}, {1,3}, {1,4}, {2, 3}}. It is known that F = dual {dual {I')) if 
no F,F' G satisfy F c F' . The dualization of a hypergraph is to construct 
the dual of the given hypergraph. 

denotes the number of hyperedges in J^, that is m, and \ denotes the 
sum of the sizes of hyperedges in J^, respectively. In particular, ||.F|| is called 
the size of Ti denotes the hypergraph composed of hyperedges {Fi, . . . , Fi}. 
For V £ V, let J^{v) be the set of hyperedges in that includes v, i.e., J^{v) = 
{F\F G J^,v & F}. For vertex subset S and vertex v, we respectively denote 
S U {v} and 5 \ {e} by 5 U u and S" \ e. 

We introduce the new concept critical hyperedge in the following. For a vertex 
subset S CV, uncov{S) denotes the set of hyperedges that do not intersect with 
S, i.e., uncov{S) = {F\F G 7", F n 5 = 0}. 5 is a hitting set if and only if 



uncov{S) = 0. For a vertex w £ 5*, a hyperedge F & F is said to be critical for v 
if n F = {v}. We denote the set of all critical hyperedges for v by crit{v,S), 
i.e., crit{v, S) = {F\F £ T, S H F = {v}}. Suppose that 5 is a hitting set. If v 
has no critical hyperedge, every F & T includes a vertex in S other than v, thus 
5 \ t; is also a hitting set. Therefore, we have the following property. 

Property 1. Sis a minimal hitting set if and only if uncov{S) = 0, and crit{v, S) ^ 
holds for any v S. 

If crit{v, S) ^% for any v G S, we say that S satisfies the minimality condi- 
tion. Our algorithm updates crit to check the minimality condition quickly, by 
utilizing the following lemmas. Let us consider an example of crit. Suppose that 
T = {{1,2}, {1,3}, {2, 3, 4}}, and the hitting set S is {1,3,4}. We can see that 
crit{l,S) = {{1,2}}, crit{3,S) = %,crit{A,S) = 0, thus S is not minimal, and 
we can remove cither 3 or 4. For S' = {1,3}, crii(l, S') = {{1, 2}}, crit{i, S') = 
{{2,3,4}}, thus S' is a minimal hitting set. The following lemmas are the keys 
to our algorithms. 

Lemma 1. For any vertex subset S, v G S and v' ^ S, crit{v, S U v') = 
crit{v,S) \J^{v'). Particularly, crit{v,S[Jv') C crit{v,S) holds. 

Proof. For any F e crit{v, S), {SUv')nF = {v} holds if F is not in T{v'), and 
thus it is included in crit{v, SLiv')\ T{v). Conversely, (S* U w') fl F = {v} holds 
for any F G crit{v, Syjv'). This means that Sr\F = {v}, and F G crit{v, S). □ 

Lemma 2. For any vertex subset S and v' ^ S, crit{v', S Uv') = uncov{S) fl 

Proof. Since any hyperedge not in uncov{S) has a non-empty intersection with 
S, F can never be a critical hyperedge for w'. Any critical hyperedge for v' 
includes v' thus, we can see that crit{v' , S) C uncov(S) n J'{v'). Conversely, for 
any hyperedge F included in uncov{S) fl J^{v'), F Ci {S U v') = {v'}, thereby 
uncov{S) n T{v') C crit{v', S). Hence, the lemma holds. □ 
The next two lemmas follow directly from the above. 

Lemma 3. [7] If a vertex subset S satisfies the minimality condition, any of 
its subsets also satisfy the minimality condition, i.e., the minimality condition 
satisfies the monotone property. 

Lemma 4. [7] If a vertex subset S does not satisfy the minimality condition, 
S is not included in any minimal hitting set. In particular, any minimal hitting 
set S is maximal in the set system composed of vertex subsets satisfying the 
minimality condition. 

Lemma 5. For any vertex subset S, J2ves \crit{v, S)\ < 



Proof. From the definition of the critical hyperedge, any hyperedge F € !F can 
be critical for at most one vertex. Thus, the lemma holds. □ 



3 Existing Algorithms 



This section is devoted to explaining the framework of the existing algorithms 
related to our algorithms: DL algorithm, KS algorithm, and HBC algorithm. 
The DL algorithm starts by computing dual{T\) and then iteratively computes 
dual{Ti) from dual{Fi-\). For any S e dual{Ti), either S G dual{Ti-i) holds, 
or S\v € dual{J^i-i) holds for {w} = SnFj. Note that when S G dual{Fi) is not 
in dual{Ti-i), S D Fi is composed of exactly one vertex, since crit{v,S) must 
be {F,}. However, for any S G dual^J^^-i), S G dual{Fi) if 5 n 7^ 0. When 
5 n Fj = 0, S U w with V £ Fi may be in dual{J^i). The algorithm is as follows. 

ALGORITHM DL (T" = {Fi, . . . , F„}) 

1. -Do := m 

2. for « := 1 to m 

3. Vi ■= 

4. for each S G 2?, 1 do 

5. if S* n Fi ^ then insert S to Vi 

6. else for each w G Fi do 

7. if no 5' e Di_i satisfies S' S and S' C S U v then insert SUvtoVi 

8. end for 

9. end for 

10. end for 

After the computation, "Dm is dual{!F). Line 7 is for checking whether S U 
V is in Vi or not by looking for a hitting set included in S v. This needs 
basically 0(X^ ||Di_i||) time and is a bottleneck computation of the algorithm. 
This part requires all of 2?i memory, thus we need to perform a breadth-first 
search. Kavadias and Stavropoulos[8, 9] proposed a depth-first version of this 
algorithm. According to the hitting sets generation rule, each hitting set in J^i is 
uniquely generated from a hitting set of Ti-i. Thus, starting from each hitting 
set in J^i, we perform this generation rule in a depth- first manner, and visit 
all the minimal hitting sets of all J-'i. The algorithm does not store each Vi in 
memory, and it checks for the minimality of S U w by checking whether S \Jv\f 
is a hitting set or not for each f € S. The algorithm is as follows. 

ALGORITHM KS {S, i) 

1. \fi = m then output S; return 

2. if 5* n Fi 7^ then call K.S{S, i + 1) 

3. else for each w G Fj do 

4. for each u G S" do 

5. if5'Uii\'uisa hitting set then go to 8. 

6. end for 

7. call K.S{S \Jv,i + l) 

8. end for 

The bottleneck is also the minimality check on line 5 that basically needs to 
access all hyperedges in ^i_i. 



The number hitting sets that are added a vertex is | ljfc=i \ dual{J^)\ = 
0(| Ui^Li The algorithms perform the minimality check for each addition, 
thus roughly speaking, the number of minimality checks in both algorithms is 
^(lUiii^il ^ /) where / is the average size of hyperedges. For S G Vi and 
V £ S, crit{v,S) is non-empty, since v always has a critical hyperedge in J^i. 
It implies that any subset S explored by the algorithm satisfies the minimality 
condition. 

The minimality check is usually one of the time-consuming parts of dualiza- 
tion algorithms. The check whether the current vertex subset is a hitting set or 
not is also a time consuming part, but it can be done by updating uncov, thus 
for almost all vertex subsets to be operated on, its cost is much smaller than the 
minimality check. Therefore, the number of minimality checks would be a good 
measure of the efficiency of the search strategy. Here, we define the search space 
of an algorithm by the set of vertex subsets that are checked the minimality. The 
size of the search space is equal to the number of executed minimality checks. 

The cost for the minimality check increases with ID^I, for the DL algorithm, 
and with S and | \T\ \ for the KS algorithm. Thus, the DL algorithm will be faster 
when the dual{J^) is small, whereas the KS algorithm will be faster when 
is small and S is small on average. 

The HBC algorithm is a kind of branch and bound algorithm. It starts from 
the emptyset, and chooses elements one by one. For each element v, it generates 
two recursive calls concerned with a choice; add v to the current vertex subset, 
and do not add it. When the current vertex subset becomes a hitting set, it 
checks the minimality, and outputs it if minimal. To speed up the computation, 
the algorithm prunes branches through the use of the so called Galois condition. 
The Galois condition for S and v ^ S is \uncov{S)\ = \uncov{S U and when 
it holds, iS" U is never included in a minimal hitting set, thus we can terminate 
the recursive call with respect to 5Ut;. The Galois condition is equivalent to our 
minimality condition, since it is equivalent to crit{v, SUv) = 0^. The algorithm 
is written as follows. 

ALGORITHM HBC {J^ = {Fi, . . . ,Fm}) 

1. Do := {0} ; i = 

2. while D, ^ 

3. for each S €Vi do 

4. if uncov{S) = then output 5* 

5. for each v larger than maximum vertex in S do 

6. if 5 U t; satisfies the Galois condition then insert S to D, 

7. end for 

8. end for 

9. end while 



^ the Galois condition is proposed in 2007[7], while crit is proposed in 2003[12, 15]. 
The term "minimality condition" first appears in [7]. 



If the pruning method is only the Galois condition, the vertex subsets to be 
explored by the algorithm is all the non-hitting sets satisfying the minimality 
condition. Thus, the size of search space of HBC algorithm is no less than that 
of DL algorithm. On contrary, DL and KS algorithms has to update the minimal 
hitting sets even if they do not change, thus the HBC algorithm has an advantage 
in this point. 

4 New Search Algorithms and Minimahty Check 

We propose two depth-first search (branch and bound) algorithms for dualization 
problem. The main differences from the existing algorithms are to use crit for 
the minimality condition check, and pruning methods to avoid searching hopeless 
branches. The algorithms keep lists crii[u] and uncov representing crit(u, S) and 
uncov{S). When the algorithm adds a vertex f to 5* and generates a recursive 
call, it updates crit\} and uncov by the following algorithm. 

Updatc_crit_uncov (u, crit[], uncov) 

1. for each F G T{v) do 

2. if F e crif['u] for a vertex u G S then remove F from crit[u] 

3. if G uncov then uncov := uncov \ F; crit[e] := crit[e] U {F} 

4. end for 

After execution, crzt[u] becomes crit{u. SUv). Since each hyperedge F can be 
critical hyperedge for at most one vertex, we put F on the vertex as a mark and 
perform step 2 in a constant time. Thus, the time complexity of this algorithm 
is 0(|J^(f)|). Even though this algorithm is simple, we can reduce the time 
complexity of an iteration of the KS algorithm from 0{\V\ x \ \J-\\) to 0{\\J^\\). 

4.1 Reverse Search Algorithm 

One of our algorithms is based on the reverse search [1], and it can be regarded 

as an improved version of the KS algorithm. Let S = UI^i that is the set of 
vertex subsets that are operated by KS algorithm. Let us denote the minimum 
i such that Fi e crit{v, S) by min_crit{v, S), and the minimum i such that Fj G 
uncov{S) by minjuncov{S). min_crit{v , S) (resp., minjuncov{S)) is defined as 
m + 1 if crit{v,S) (resp., uncov{S)) is empty. Using these terms, we give a 
characterization of S. 

Lemma 6. 5 ^ belongs to S if and only if min-crit{v, S) < min-uncov{S) 
holds for any v € S. 

Proof. Suppose that S € S, thus S G J^i for some i. We can see that min-uncov{S) > 
i, crit{v, S) includes a hyperedge Fj G T with i < j, and thus minjzrit{v, S) < i 
for any v. Thus, min-crit{v, S) < min-uncov{S) holds for any v € S. 

Conversely, suppose that minjcrit{v, S) < minjancov{S) holds for any v G 

5. Then, we can see that crit{v, 5) ^ for any v £ S because minjcrit{v, S) < 



m+ 1. Let i = min_uncov{S) — 1. Note that i < m. We can then see that S' is a 
hitting set of J^j and min-crit{v, S) < i. This in turn imphes that S* is a minimal 
hitting set in J^j, and thus, it belongs to S. □ 

For S G S, min-crit{S) is the minimum index i such that is a minimal 
hitting set of J^i, i.e., min.crit{S) = msLXy^s{'min-crit{v, S)}. We define the 
parent P{S) of S hy S \v, where v is the vertex such that min-crit{v, S) = 
min-crit{S). Since any Fi is critical for at most one vertex, min-crit{S) and the 
parent are uniquely defined. The parent-child relation given by this definition is 
acyclic, thus forms a tree spanning all the vertex subsets in S and rooted at the 
emptyset. Our algorithm performs a depth-first search on this tree starting from 
the emptyset. This kind of search strategy is called reverse search[l]. 

This search strategy is essentially equivalent to KS algorithm if we skip all 
redundant iteration in which wc add no vertex to the current vertex subset. In a 
straightforward implementation of KS algorithm, we have to iteratively compute 
the intersection of Fi and the current vertex subset S until we meet the Fi that 
docs not intersect with S. When uncov(S) is not so large, it takes long time. 
Particularly, when uncov{S) = 0, we may spend ^dlJ^H) time. On contrary, in 
our strategy, we have only to maintain uncov{S), that is much lighter. 

The depth- first search starts from the emptyset. When it visits a vertex 
subset S, it finds all children of S iteratively and generates a recursive call for 
each child. In this way, we can perform a depth-first search only by finding 
children of the current vertex subset. The way to find the children is shown in 
the following lemma. 

Lemma 7. Let S € S and i = min-uncov{S). A vertex subset S' is a child of 
S if and only if 

(1) i < m + 1 

(2) S' = SUv for some v G Fi, and 

(3) min-crit{v' , S') < i holds for any v' G S. 

Proof. Suppose that S' is a child of S. We can see that uncov{S) is not empty, 
and thus (1) holds. Prom the definition of the parent, S is obtained from S' 
by removing a vertex v from S' . From min_crit{S') < minjincov{S') and 
uncov(S) — uncov(S') U crit{v, S'), we obtain min-crit{v, S') = min-crit{S') = 
min-uncov{S). This means that Fi G crit{v, S'), and thus (2) holds. This equa- 
tion also implies that (3) holds. 

Suppose that S' is a vertex subset satisfying (1), (2) and (3). From (2), 
we see that min_crit{v , S') = i. Since uncov(S') > i, this together with (3) 
implies that 5' satisfies the conditions in Lemma 6 and thereby is included in 
S. min_crit{v, S') = i and (3) leads to min-crit{S') = i and P{S') = S' \ v = S. 
Note that condition (1) guarantees the existence of Fi given condition (2), thus 
it is implicitly used in the proof. □ 

From Lemma 7, we can find all children of S by adding each vertex v ^ Fi 
to S, and checking (3). This can be done in a short time by updating crit. The 
algorithm is as follows. 



global variable: crit[], uncov 



ALGORITHM RS (5) 

1. if uncov = then output S; return 

2. i := min{j\Fj G uncov} 

3. for each v € Fi do 

4. call Update_crit_uncov {v,crit[], uncov) 

5. if min{i|i=^t e crit[f]} < i for each f eS then call RS(5') 

6. recover the change to critW and uncov done in 4 

7. end for 

Theorem 1. Algorithm RS enumerates all minimal hitting sets in 0(||^|| x |<S|) 

time and 0{\\J-\\) space. 

Proof. Since the parent-child relationship induces a rooted tree spanning all 
vertex subsets in S, the algorithm certainly enumerates all vertex subsets in S. 
Since any minimal hitting set is included in S, all minimal hitting sets are found 
by the algorithm. The update of crif[] and uncov is done in 0{\F{v)\) time, thus 
an iteration of the algorithm takes 0(||-F||) time. In total, the algorithm takes 
Oi\\F\\ X ||5||) time. 

The algorithm requires extra memory for storing crit[] and uncov and for 
memorizing the hyperedges removed in step 4. Since critW and uncov are pairwise 
disjoint, the total memory for crit and uncov is 0{m). If a hyperedge is removed 
from a list, it will not be removed again in the deeper levels of the recursion, from 
the monotonicity of crit. Thus, it also needs 0(m) memory. The most memory 
is for T{v) of each v, and takes OdlJ"!!) space. □ 

pruning method Suppose that in an iteration we are operating on a vertex 
subset S, and have confirmed that SUv does not satisfy the minimality condition. 
From Lemma 3, we observe that S'Uv does not satisfy the minimality condition 
if 5 C 5'. This means that in the recursive call generated by the iteration with 
respect to S, we do not have to care about the addition of v, thus we remove 
V from the candidate list for addition during the recursive call. This condition 
also holds when 5 U u is a minimal hitting set, since no superset of a minimal 
hitting set satisfies the minimality condition. We call the vertex v satisfying one 
of these conditions violating. 

We can apply this pruning method to the RS algorithm by finding all violat- 
ing vertices before step 3 and can output all minimal hitting sets SUv found in 
the process. We then execute the loop from step 3 to step 7 only for non- violating 
vertices, so that we can avoid unnecessary recursive calls. 

4.2 Depth-first Search Algorithm 

This subsection described a simple hill-climbing depth-first search algorithm, 
whose search space is contained in that of the HBC algorithm. We start from S = 
0, and add vertices to S recursively unless the minimality condition is violated. 
To avoid the duplication, we use a list of vertices CAND that represents the 
vertices that can be added in the iteration. The vertices not included in CAND 



will not be added, even if the addition satisfies the minimality condition, i.e., 
the iteration given S and CAND enumerates all minimal hitting sets including 
S and included in 5 U CAND by recursively generating calls. 

Suppose that an iteration is given S and CAND, and without loss of general- 
ity CAND = {vi, . . . ,Vk}. For the first vertex Vi, we make a recursive call with 
respect to SUvi, with CAND = CAND \vi, to enumerate all minimal hitting 
sets including SUvi. After the termination of the recursive call, we generate a 
recursive call for S U V2. To avoid finding the minimal hitting sets including vi, 
we give CAND \ {ui, '^2} to the recursive call. In this way, for each vertex Vi, we 
generate a recursive call with S UVi and CAND = {vi+i,. . . , Vk}- This search 
strategy is common to many algorithms for enumerating members in a mono- 
tone set system, for example clique enumeration [13]. That is, its correctness has 
already been proved. 

Next, let us describe a pruning method coming from the necessary condition 
to be a hitting set. Suppose that an iteration is given S and CAND, and let F 
be a hyperedge in uncov{S). We can see that any minimal hitting set including 
S has to include at least one vertex in S. Thus, we have to generate recursive 
calls with respect to vertices in CANDCiF, but do not have to do so for vertices 
in CAND \ F. 

In the RS algorithm, we have to find all violating vertices before generat- 
ing recursive calls. In contrast, we can omit this step from our DFS algorithm. 
Suppose that an iteration is given S and CAND, and is going to generate re- 
cursive calls with respect to vertices in vi, . . . ,Vk € F f] CAND. Then, we first 
set CAND to CAND \ {vi, . . . ,Vk}. li is not a violating vertex, we generate 
a recursive call for SUv/., and add to CAND. If v/. is a violating vertex, we 
do not add Vk to CAND. In this way, when we generate a recursive call with 
respect to S' U Vh, all violating vertices Vj, j > h have already been found, thus 
there is no need to find all them at the beginning. The algorithm is described as 
follows. 



global variable: critW, uncov, CAND 
ALGORITHM DFS (S) 

1. if uncov = then output S ; return 

2. choose a hyperedge F from uncov; 

3. C CAND n F; CAND := CAND \ C 

4. for each u G C do 

5. call Update_crit_uncov {v,crit^, uncov) 

6. if crit{f, S') ^ for each f eS then call DFS(5 U v); CAND := CAND U v 

7. recover the change to critW and uncov done in 5 

8. end for 



Similar to the case of the RS algorithm, the computation time of an iteration 
is bounded by 0(||J"||). 



4.3 Implementation Issues 

This section is devoted to the computational techniques for improving efficiency. 
Our data structure for representing hyperedges and J^{v) is an array list in 
which the IDs of vertices or hyperedges are stored. Using array list fastens the 
set operations with respect to -Fi^v) and list vertices in a hyperedge. The data 
structure for crit and uncov is a doubly linked list. In each iteration, we remove 
some hyperedge IDs from these lists and reinsert them after the termination of a 
recursive call. A doubly linked list is a good data structure for these operations, 
as it preserves the order of IDs in the list. 

4.4 Using the Adjacency Matrix for Set Operations 

When two subsc^ts 5* and S' are represented by lists of their including elements, 
the set operations such as intersection and set difference need 0(15*1 + time. 
However, when we have the characteristic vectors of S and 5', we can do better. 
The characteristic vector of S is a vector whose ith element is one if and only if i 
is included in S. To take the intersection, we scan S (or S') with the smaller size, 
and choose the elements included in S' (or S). This check can be done in 0(1) 
time with using the characteristic vector of S", thus the computation time is 
reduced to 0(min{|5'|, For computing S\S', we remove their intersection 

from S, thus the computation time is also the same. 

Our algorithms take intersection of (crit and uncov) and F{v). Updating 
the characteristic vectors of {crit and uncov) uses 0(|J-"|) memory and does 
not increase the time complexity. The characteristic vectors of F{v) for each v 
requires a lot of memory to store, thus we use it only when \\J-\\ is larger than 
n X |J^|/64, i.e., F is dense. Note that in our experiments, all instances satisfied 
this condition. 

4.5 Choosing the Smallest Hyperedge 

In the DFS algorithm, we can choose arbitrary hyperedge in uncov as F, for re- 
stricting the vertices to be added. We choose a hyperedge including the smallest 
number of vertices which have not been pruned, so that the number of recur- 
sive calls generated will be small. Counting such vertices in each hyperedge in 
uncov may take time longer than the case just choosing one arbitrary, but our 
preliminary experiments showed that it reduced the computation time almost in 
half. 

4.6 Pruning Only a Restricted Set of Items 

The pruning method described above can be applied to any vertex. However, 
applying it to all possible vertices may take a long time compared with other 
parts of an iteration. Sometimes it occurs that pruning takes a long time but 
only few branches are pruned. Thus, to make the computation time stable, we 
prune only the vertices in CAND Ci F, which are the vertices to be added to 
the current solution. This takes a time proportional to the time spent by an 
iteration, thus it never needs a long time. 



4.7 Inputting the Complement of the Hypergraph 

In some instances, T is quite dense, e.g., over 95% of vertices are included in 

many F ^ T . This occurs when the data has no clear structure and has many 
minimal hitting sets. We can often find such instances in practice, such as in 
minimal infrequent vertex subset mining from maximal frequent vertex subsets. 
In such cases, the instance itself takes up a lot of memory, and needs a long 
time to be operated on. Here, we can reduce the computation time by using the 
complement. 

The complement version of our algorithm inputs the complement of each 
F ^ J^. The operations of each iteration change so that the vertices to be added 
are vertices not in F, and taking difference in the crit update changes to taking 
the intersection. This substantially reduces the computation time, since we have 
to access only a small number of vertices/hyperedges. In our experiments, we 
found that this idea works well for very dense datasets. 

5 Computational Experiments 

In this section, we show the results of our computational experiments comparing 
our algorithms with the existing algorithms. 

5.1 Codes and Environments 

Our algorithms are implemented in C, without any sophisticated library such as 
binary tree. Existing algorithms are implemented in C++ by using the vector 
class in STL. KS algorithm and Fredman Khachiyan algorithm (BEGK[3,16]) 
are given by the authors. All tests were performed on a 3.2 GHz Core 17-960 
with a Linux operating system with 24GB of RAM memory. Note that none of 
the implementations used multi-cores. The codes and the instances are available 
at the author's Web cite (http://research.nii.ac.jp/ uno/dualization.html). 

5.2 Problem Instances 

We prepared several instances of problems in several categories as follows. The 
first category consists of randomly generated instances. Each hyperedge includes 
a vertex i with probability p. The sizes and the probabilities arc listed below. 

The instances in the second category were generated by the dataset "connect- 
4" taken from the UCI Machine Learning Repository [14] . Connect-4 is a board 
game, and each row of the dataset corresponds to a minimal winning/losing 
stage of the first player, and a minimal hitting set of a set of winning stages is a 
minimal way to disturb wining/losing plays of the first player. Prom the dataset 
of winning/losing stages, we took the first m rows to make problem instances of 
different sizes. 

The third instances are generated from the frequent itemset (pattern) mining 
problem. An itemset is a hyperedge in our terminology. For a set family T and 



a support threshold cr, an itemset is called frequent if it is included in at least 
(7 hyperedges, and infrequent otherwise. A frequent itemset included in no other 
frequent itemset is called a maximal frequent itemset, and an infrequent itemset 
including no other infrequent itemset is called a minimal infrequent itemset. A 
minimal infrequent itemset is a minimal itemset included in no maximal frequent 
itemset, and any subset of it is included in at least one maximal frequent itemset. 
Thus, the dual of the set of the complements of maximal frequent itemsets is 
the set of minimal infrequent itemsets. The problem instances are generated by 
enumerating all maximal frequent sets from the datasets "BMS-WebView-2" and 
"accidents" , taken from the FIMI repository [10] . The profiles of the datasets 
are listed below. 

The fourth instances are used in previous studies [9, 3]. 

• Matching graph (M(n)): a hypcrgraph with n vertices (n is even) and n/2 hy- 
peredges forming a perfect matching, that is, hyperedge Fi is {2i — l,2i}. This 
instance has few hyperedges but a large number of minimal hitting sets 2"/^. 

• Dual Matching graph (DM(n)): it is dual(M(n)). It has 2"/^ hyperedges on n 
nodes. This instance has a large number of hyperedges but a small number of 
minimal hitting sets n/2. 

• Threshold graph (TH(n)): a hypergraph with n vertices (n is even) and hy- 
peredge set {{i,j} ■ 1 < i < j < n,j is even}. This instance has a small number 
of hyperedges and a small number of minimal hitting sets n/2 + 1. 

• Self-Dual Threshold graph (SDTH(n)): The hyperedge set of SDTH(n) is 
given as {{n - l,n}} U {{n - 1} U £; | E eTH(n - 2)} U {{n} U E \ E e 
dual{TR{n — 2))}. SDTH(n) has the same number of minimal hitting sets as its 
hyperedges, (n - 2)^/4 + n/2 + 1. 

• Self-Dual Fano-Plane graph (SDFP(n)): A hypergraph with n vertices and (A; — 
2)^/4-1- A:/2-|-l hyperedges, where k = (n — 2)/7. The construction starts with the 
set of lines in a Fano plane Hq = {{1, 2, 3}, {1, 5, 6}, {1, 7, 4}, {2, 4, 5}, {2, 6, 7}, 
{3,4,6}, {3,5,7}}. Then we set H = HiU H2U ■ ■ ■ U Hk, where i?2, -fffc 
are k disjoint copies of Hq. The dual of H is the hypergraph of all 7k unions 
obtained by taking one hyperedge from each of k copies of Hq{Hi, H2, • • • , Hk). 
We finally obtain SDFP(n), which is a hypergraph of 1 -|- 7A; + 7A: hyperedges. 

5.3 Differences 

Before showing the results, we discuss the difference between the algorithms 
from the viewpoint of algorithmic structures. Basically, the search space of DL, 
KS, and our RS algorithms are the same. However, DL and KS check the same 
hitting sets many times, while RS operates by one hitting set at most once. 
In addition, our RS has a pruning method, thus the number of hitting sets 
generated may be decreased. The search spaces of the HBC and DFS algorithms 
are basically the same, but DFS reduces it by using pruning methods. 

For the minimality check, DL, BMR, and HBC algorithms access basically 
all members in Dj. Basically, this takes 0(min{2l^l log HPH) time. Some 
heuristics can reduce the time, but the reduction ratio would be limited. In 



contrast, KS takes 0(151 x \\J^\\) time, and RS takes 0{\J^{v)\) time. Thus, we 
can expect that 

— DL, BMR, and HBC are faster when there are only a few minimal hitting 

sets , 

— HBC is faster if the search space of DL is larger than the set of vertex subsets 
satisfying the minimality condition, for example, in the case that the sizes 
of minimal hitting sets are quite small 

— KS, RS, and DFS are faster when \T\ is small, 

— RS is faster than KS when the sizes of minimal hitting sets are not small, 
and vertex unification (done by KS) does not work. 

5.4 Results 

Table 1-10 compare the computation times. In these tables, represents the 
number of hyperedges, represents the average size of hyperedges, \dual{J^)\ 
represents the number of minimal hitting sets and \S\* represents the average 
size of minimal hitting sets. The computation time is in seconds. Furthermore, 
"-" means that the computation time was more than 1000 seconds, and "fail" 
implies that the computation did not terminate normally because of a shortage 
of memory or some error. 

Table 1. Computation time on the dataset of winning stage in Connect-4 



w 


100 


200 


400 


800 


1600 


3200 


6400 


12800 


BEGK 


1.2 


5.2 


46 


55 


430 








DL 


0.005 


0.061 


1.6 


6.2 


180 








BMR 


0.006 


0.044 


0.52 


0.67 


17 


710 






HBC 


33 
















KS 


0.021 


0.14 


1.1 


3.2 


73 


860 






RS 


0.001 


0.005 


0.032 


0.078 


0.41 


4.7 


20 


83 


DFS 


0.001 


0.006 


0.021 


0.056 


0.27 


2.6 


11 


48 




100 


200 


400 


800 


1600 


3200 


6400 


12800 




8 


8 


8 


8 


8 


8 


8 


8 


\dual{T)\ 


287 


1145 


6069 


11675 


71840 


459502 


1277933 


11614885 


\sr 


10.70 


11.95 


14.15 


14.84 


16.46 


17.69 


18.67 


20.54 



The computation time of algorithms which store minimum hitting sets, such 
as DL and BMR, depends on \dual{J^)\ and \S\*. On the other hand, the com- 
putation time of the depth-first algorithms, such as KS, RS and DFS, depends 



Table 2. Computation time on the dataset of losing stage in Connect-4 



I 


100 


200 


400 


800 


1600 


3200 


6400 


12800 


BEGK 


4.7 


51 


110 


340 










DL 


0.11 


6.4 


44 


210 










BMR 


0.047 


2.2 


5.1 


16 


130 








HBC 


110 
















KS 


0.057 


2.6 


4.6 


20 


97 








RS 


0.009 


0.052 


0.14 


0.41 


1.6 


15 


98 


420 


DPS 


0.006 


0.044 


0.09 


0.28 


0.94 


12 


40 


180 


1-^1 


100 


200 


400 


800 


1600 


3200 


6400 


12800 




8 


8 


8 


8 


8 


8 


8 


8 


\dual{T) 


2341 


22760 


33087 


79632 


212761 


2396735 


4707877 


16405082 


\S\' 


11.19 


12.43 


13.59 


14.62 


15.73 


17.06 


17.41 


19.09 



Table 3. Computation time on matching Table 4. Computation time on dual 
graphs matching graphs 



M 


20 


24 


28 


32 


36 


40 


BEGK 


0.045 


0.72 


1.1 


4.4 


36 


fail 


DL 


0.003 


0.012 


0.04 


0.21 


0.89 


3.9 


BMR 


0.003 


0.016 


0.045 


0.19 


1.2 


5.3 


HBC 


0.17 


2.4 


37 


520 






KS 





0.003 


0.01 


0.044 


0.2 


0.87 


RS 





0.004 


0.013 


0.059 


0.25 


1.1 


DPS 


0.002 


0.006 


0.023 


0.06 


0.26 


1.1 




10 


12 


14 


16 


18 


20 


\Fr 


2 


2 


2 


2 


2 


2 


\dual{T)\ 


2iu 






2ib 




2^" 


\S\* 


10 


12 


14 


16 


18 


20 



DM 


20 


24 


28 


32 


36 


40 


BEGK 


1.4 


3.1 


8.9 


67 


fail 


fail 


DL 


0.01 


0.054 


0.25 


1.2 


7.1 


70 


BMR 


0.038 


0.4 


4.2 


49 


540 




HBC 


0.21 


3.3 


57 


900 






KS 


0.012 


0.071 


0.56 


5.6 


60 


780 


RS 


0.007 


0.054 


0.5 


4.8 


50 




DPS 


0.014 


0.075 


0.64 


6.8 


73 






2i(i 


2iLJ 


2i4 


2ib 


2i8 


2^" 




10 


12 


14 


16 


18 


20 


\dual{T) 


10 


12 


14 


16 


18 


20 


\sr 


2 


2 


2 


2 


2 


2 



Table 5. Computation time on threshold Table 6. Computation time on self-dual 
graphs threshold graphs 



TH 


40 


80 


120 


160 


200 


BEGK 


0.28 


0.84 


2.7 


7.5 


19 


DL 


0.004 


0.027 


0.091 


0.24 


0.52 


BMR 


0.009 


0.15 


0.6 


2.6 


6.6 


HBC 












KS 


0.021 


0.34 


2.5 


11 


35 


RS 


0.001 


().0():i 


O.OIG 


0.019 


0.048 


DPS 





0.003 


0.01 


0.026 


0.037 




400 


1600 


3600 


6400 


10000 


\F[ 


2 


2 


2 


2 


2 


\dual{r)\ 


21 


41 


61 


81 


101 


\S\* 


29.05 


59.02 


89.02 


119.01 


149.01 



SDTH 


42 


82 


122 


162 


202 


BEGK 


0.53 


3.3 


27 


110 


310 


DL 


0.008 


0.052 


0.2 


0.56 


1.3 


BMR 


0.012 


0.19 


0.87 


2.7 


7.2 


HBC 












KS 


0.057 


1 


6.3 


25 


74 


RS 


0.002 


0.01 


0.017 


0.019 


0.065 


DPS 


0.001 


0.01 


0.025 


0.041 


0.068 




422 


1642 


3662 


6482 


10102 






1.12 


1. 15 


l.i() 


4.47 


\dual{iF)\ 


422 


1642 


3662 


6482 


10102 


\S\* 


4.34 


4.42 


4.45 


4.46 


4.47 



Table 7. Computation time on self-dual Table 8. Computation time on randomly 

Fano-plane graphs generated instances 



SDFP 


9 


16 


23 


30 


37 


P 


0.9 


0.8 


0.7 


0.6 


BEGK 


0.043 


1.3 


27 


590 




BEGK 


64 


510 






DL 





0.004 


0.22 


22 




DL 


20 


210 






BMR 


0.001 


0.003 


0.11 


3.4 


260 


BMR 


1.8 


20 


320 




HBC 





0.023 


3.2 


540 




HBC 


0.078 


1.9 


33 


680 


KS 





0.002 


0.032 


0.64 


26 


KS 


3.1 


37 


290 




RS 





0.001 


0.022 


0.39 


16 


RS 


0.12 


0.87 


6.4 


52 


DPS 








0.014 


0.42 


20 


DPS 


0.093 


0.84 


6.1 


52 


l-^l 


15 


64 


365 


2430 


16843 


cRS 


0.13 


2.6 


29 


300 


\F\* 


3.87 


6.27 


9.63 


12.89 


15.97 


cDFS 


0.087 


1.8 


21 


250 


\dual{J-) 


15 


64 


365 


2430 


16843 




1000 


1000 


1000 


1000 


\sr 


3.87 


6.27 


9.63 


12.89 


15.97 




45.056 


39.898 


35.024 


29.953 




\dual{J^)\ 


30429 


364902 


2509943 


16809231 


isi" 


3.75 


4.88 


5.94 


7.31 



Table 9. Computation time on all maximal frequent set from "accidents" 



ac 


200 


150 


130 


110 


90 


70 


50 


30 


BEGK 


0.54 


3.2 


8.7 


22 


87 


430 






DL 


0.004 


0.042 


0.28 


0.98 


4.8 


31 


270 




BMR 


0.008 


0.041 


0.074 


0.17 


2.3 


5.7 


21 


140 


HBC 


0.004 


0.018 


0.064 


0.16 


0.95 


3.4 


19 


170 


KS 


fail 


fail 


fail 


fail 


fail 


fail 


fail 


fail 


RS 


0.001 


0.011 


0.02 


0.052 


0.26 


0.78 


3.3 


32 


DPS 


0.002 


0.013 


0.034 


0.05 


0.23 


0.76 


3.2 


28 


cRS 





0.007 


0.027 


0.05 


0.23 


1.4 


12 


230 


cDPS 


0.001 


0.005 


0.019 


0.051 


0.18 


0.95 


8.4 


170 


l-^l 


81 


447 


990 


2000 


4322 


10968 


32207 


135439 


\F\* 


57.48 


56.34 


72.85 


72.23 


326.66 


326.08 


325.31 


430.39 


\dual{T)\ 


253 


1039 


1916 


3547 


7617 


17486 


47137 


185218 


\S\* 


2.57 


3.77 


4.25 


4.73 


5.09 


5.70 


6.46 


7.32 



Table 10. Computation time on all maximal frequent set from "BMS-WebView2" 



bms2 


800 


500 


400 


200 


100 


50 


30 


20 


10 


BEGK 




















DL 


o.csr 


:5.9 


5.i 


91 












BMR 


4.7 


18 


20 


110 


380 


1000 








HBC 


0.066 


0.2 


0.31 


1.1 


3.5 


8.8 


23 


37 


87 


KS 


fail 


fail 


fail 


fail 


fail 


fail 


fail 


fail 


fail 


RS 


0.039 


0.12 


0.15 


0.87 


9.2 


71 


340 


800 




DFS 


0.048 


0.089 


0.15 


1.1 


13 


92 


400 


950 




cRS 


0.004 


0.009 


0.015 


0.056 


0.25 


1 


4 


10 


47 


cDFS 


0.003 


0.007 


0.012 


0.053 


0.25 


1.1 


4.4 


12 


62 




62 


152 


237 


823 


2591 


6946 


17315 


30405 


74262 




3338.68 


3261.89 


3338.18 


3337.39 


3336.36 


3335.91 


3335.23 


3334.97 


3334.19 


\dual{T)\ 


4616 


16991 


15993 


89448 


438867 


1289303 


2297560 


3064937 


4582209 


\S\* 


1.29 


1.88 


1.82 


1.99 


2.01 


2.02 


2.04 


2.07 


2.15 



on \ J^\ and \F\* . In particular, RS and DFS arc much faster than any other algo- 
rithm in almost all instances, up to 10,000 times in some cases. The exceptions 
are matching graphs and dual matching graphs; both are extreme cases of only 
few small minimal hitting sets that can be easily found, and of few small hypcr- 
edges. Straightforward algorithms are fast for these cases. Also, HBC, cRS and 
cDFS are faster when the hypergraph is dense. BEGK is the slowest in most in- 
stances; algorithms with smaller complexity arc not always faster. KS algorithm 
embodies an idea to unify the isomorphic vertices into one to reduce the number 
of iterations, but it seems that this is not so much efficient in our experiments. 
In our extra experiments, such isomorphic vertices exist in only a few iterations, 
thus the improvement brought about by unifying them would be limited. 

Note that in instance M, DL is not slow even though \dual{J^)\ is very large. 
This reason would be that for any 5 n i^i = for all i and the minimality check 
would not be required at all. In several instances, BMR is slower than DL, even 
though it is an improved version. The reason would be that BMR uses up a lot 
of time in preprocessing. 

The following Table 11 lists the average ratios of computation times relative 
to the case without pruning. The value is the average over all instances in the 
categories, and smaller values mean more improvement. In some cases the ratio 
is slightly larger than 1.0, however basically the pruning works well especially for 
RS. The reason that the pruning is not so efficient for DFS is that DFS already 
has a pruning method, thus the improvement is limited. 

We also evaluated the total memory usage of each algorithm. The memory 
usage mainly depends on the number of minimal hitting sets, thus we display 
two extreme cases; dual matching graphs DM and the randomly generated in- 
stances p. In the results, all algorithms use a lot of memory when \J^\ is large. 



Table 11. Reduction ratio of computation time by pruning method 



instance 


w 


/ 


M 


DM 


TH 


SDTH 


SDFP 


P 


ac 


bms2 


RS (all) 


0.37 


0.44 


0.73 


0.16 


1.00 


0.20 


0.30 


0.33 


0.34 


0.56 


DFS (all) 


0.98 


1.09 


1.08 


1.03 


0.86 


1.03 


0.46 


0.94 


0.73 


1.01 


RS (large) 


0.19 


0.19 


0.96 


0.11 


0.77 


0.12 


0.15 


0.29 


0.17 


0.33 


DFS (large) 


0.96 


0.95 


1.01 


1.00 


0.83 


1.19 


0.68 


0.94 


0.44 


1.00 



In particular, DL, BMR and HBC use more memory, since STL library uses a 
much memory for the sake of making variable operations more efficient. Our al- 
gorithm and KS algorithm are quite stable to increasing the number of minimal 
hitting sets, while the others are quite sensitive. KS uses 2.3 megabytes of mem- 
ory while ours use 12 megabytes. However, 12 megabytes are used by standard 
library (libc), thus basically the difference can be ignored. 



Table 12. Total memory requirement for Table 13. Total memory requirement for 
dual matching graphs (megabytes) randomly generated instances (megabytes) 



DM 


20 


24 


28 


32 


36 


40 


BEGK 


51 


51 


58 


65 


fail 


fail 


DL 


43 


45 


51 


160 


580 


2300 


BMR 


21 


24 


41 


110 


610 




HBC 


25 


76 


710 


7900 






KS 


1.9 


3 


8 


25 


94 


300 


RS 


13 


13 


15 


24 


66 




DFS 


13 


13 


15 


24 


66 






2i(i 




2i4 


2ib 


2is 


2^" 




10 


12 


14 


16 


18 


20 


\dual{T') 


10 


12 


14 


16 


18 


20 




2 


2 


2 


2 


2 


2 



P 


0.9 


0.8 


0.7 


0.6 


BEGK 


51 


130 






DL 


49 


120 






BMR 


30 


100 


660 




HBC 


23 


63 


850 


13000 


KS 


2.3 


2.3 


2.3 




RS 


12 


12 


12 


12 


DFS 


12 


12 


12 


12 


cRS 


12 


12 


12 


12 


cDFS 


12 


12 


12 


12 




1000 


1000 


1000 


1000 


\p\* 


45.056 


39.898 


35.024 


29.953 


\dual(J-)\ 


30429 


364902 


2509943 


16809231 




3.75 


4.88 


5.94 


7.31 



6 Conclusion 

We proposed efficient algorithms for solving the dualization problem. The new 
depth-first search type algorithms are based on reverse search and branch and 
bound with a restricted search space. We also proposed an efficient minimality 
condition check method that exploits a new concept called "critical hypcrcdges" . 
Computational experiments showed that our algorithms outperform the existing 
ones in almost all cases, while using less memory even for very large-scale prob- 
lems with up to millions of hyperedges. In some cases, though, our algorithms 



take a long time for the minimality check. Shortening this time wiU be one of 
the future tasks. More efficient pruning methods are also an interesting topic of 
future work. 
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